Data Augment

Augment

class openspeech.data.audio.augment.JoiningAugment[source]

Data augment by concatenating audio signals

Inputs:

signal: np.ndarray [shape=(n,)] audio time series

Returns: signal
  • signal: concatenated signal

class openspeech.data.audio.augment.NoiseInjector(noise_dataset_dir: str, sample_rate: int = 16000, noise_level: float = 0.7)[source]

Provides noise injection for noise augmentation.

The noise augmentation process is as follows:

1: Randomly sample audios by noise_size from dataset 2: Extract noise from audio_paths 3: Add noise to sound

Parameters
  • noise_dataset_dir (str) – path of noise dataset

  • sample_rate (int) – sampling rate

  • noise_level (float) – level of noise

Inputs: signal
  • signal: signal from audio file

Returns: signal
  • signal: noise added signal

class openspeech.data.audio.augment.SpecAugment(freq_mask_para: int = 18, time_mask_num: int = 10, freq_mask_num: int = 2)[source]

Provides Spec Augment. A simple data augmentation method for speech recognition. This concept proposed in https://arxiv.org/abs/1904.08779

Parameters
  • freq_mask_para (int) – maximum frequency masking length

  • time_mask_num (int) – how many times to apply time masking

  • freq_mask_num (int) – how many times to apply frequency masking

Inputs: feature_vector
  • feature_vector (torch.FloatTensor): feature vector from audio file.

Returns: feature_vector:
  • feature_vector: masked feature vector.

class openspeech.data.audio.augment.TimeStretchAugment(min_rate: float = 0.7, max_rate: float = 1.4)[source]

Time-stretch an audio series by a fixed rate.

Inputs:

signal: np.ndarray [shape=(n,)] audio time series

Returns

np.ndarray [shape=(round(n/rate),)] audio time series stretched by the specified rate

Return type

y_stretch